Learning spoken document similarity and recommendation using supervised probabilistic latent semantic analysis
نویسندگان
چکیده
This paper presents a model-based approach to spoken document similarity called Supervised Probabilistic Latent Semantic Analysis (PLSA). The method differs from traditional spoken document similarity techniques in that it allows similarity to be learned rather than approximated. The ability to learn similarity is desirable in applications such as Internet video recommendation, in which complex relationships like userpreference or speaking style need to be predicted. The proposed method exploits prior knowledge of document relationships to learn similarity. Experiments on broadcast news and Internet video corpora yielded 16.2% and 9.7% absolute mAP gains over traditional PLSA. Additionally, a cascaded Supervised+Discriminative PLSA system achieved a 3.0% absolute mAP gain over a Discriminative PLSA system, demonstrating the complementary nature of Supervised and Discriminative PLSA training.
منابع مشابه
Presentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures
Automatic short answer grading (ASAG) is the automated process of assessing answers based on natural language using computation methods and machine learning algorithms. Development of large-scale smart education systems on one hand and the importance of assessment as a key factor in the learning process and its confronted challenges, on the other hand, have significantly increased the need for ...
متن کاملPresentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures
Automatic short answer grading (ASAG) is the automated process of assessing answers based on natural language using computation methods and machine learning algorithms. Development of large-scale smart education systems on one hand and the importance of assessment as a key factor in the learning process and its confronted challenges, on the other hand, have significantly increased the need for ...
متن کاملSpoken Lecture Summarization by Random Walk over a Graph Constructed with Automatically Extracted Key Terms
This paper proposes an improved approach for spoken lecture summarization, in which random walk is performed on a graph constructed with automatically extracted key terms and probabilistic latent semantic analysis (PLSA). Each sentence of the document is represented as a node of the graph and the edge between two nodes is weighted by the topical similarity between the two sentences. The basic i...
متن کاملHierarchical topic organization and visual presentation of spoken documents using probabilistic latent semantic analysis (PLSA) for efficient retrieval/browsing applications
The most attractive form of future network content will be multi-media including speech information, and such speech information usually carries the core concepts for the content. As a result, the spoken documents associated with the multi-media content very possibly can serve as the key for retrieval and browsing. This paper presents a new approach of hierarchical topic organization and visual...
متن کاملLearning aspect models with partially labeled data
0167-8655/$ see front matter 2010 Elsevier B.V. A doi:10.1016/j.patrec.2010.09.004 ⇑ Corresponding author. Address: National Centre f okritos”, Athens, Greece. Tel.: +302106503204; fax: + E-mail address: [email protected] (A. Kri In this paper, we address the problem of learning aspect models with partially labeled data for the task of document categorization. The motivation of this w...
متن کامل